The QR factorization and the SVD are two fundamental matrix decompositionswith applications throughout scientific computing and data analysis. Formatrices with many more rows than columns, so-called "tall-and-skinnymatrices," there is a numerically stable, efficient, communication-avoidingalgorithm for computing the QR factorization. It has been used in traditionalhigh performance computing and grid computing environments. For MapReduceenvironments, existing methods to compute the QR decomposition use anumerically unstable approach that relies on indirectly computing the Q factor.In the best case, these methods require only two passes over the data. In thispaper, we describe how to compute a stable tall-and-skinny QR factorization ona MapReduce architecture in only slightly more than 2 passes over the data. Wecan compute the SVD with only a small change and no difference in performance.We present a performance comparison between our new direct TSQR method, astandard unstable implementation for MapReduce (Cholesky QR), and the classicstable algorithm implemented for MapReduce (Householder QR). We find that ournew stable method has a large performance advantage over the Householder QRmethod. This holds both in a theoretical performance model as well as in anactual implementation.
展开▼